151 research outputs found

    Mixing representation levels: The hybrid approach to automatic text generation

    Full text link
    Natural language generation systems (NLG) map non-linguistic representations into strings of words through a number of steps using intermediate representations of various levels of abstraction. Template based systems, by contrast, tend to use only one representation level, i.e. fixed strings, which are combined, possibly in a sophisticated way, to generate the final text. In some circumstances, it may be profitable to combine NLG and template based techniques. The issue of combining generation techniques can be seen in more abstract terms as the issue of mixing levels of representation of different degrees of linguistic abstraction. This paper aims at defining a reference architecture for systems using mixed representations. We argue that mixed representations can be used without abandoning a linguistically grounded approach to language generation.Comment: 6 page

    Exploiting Lexical Resources for Therapeutic Purposes: the Case of WordNet and STaRS.sys

    Get PDF
    In this paper, we present an on-going project aiming at extending the Word-Net lexical database by encoding common sense featural knowledge elicited from language speakers. Such extension of WordNet is required in the framework of the STaRS.sys project, which has the goal of building tools for supporting the speech therapist during the preparation of exercises to be submitted to aphasic patients for rehabilitation purposes. We review some preliminary results and illustrate what extensions of the existing WordNet model are needed to accommodate for the encoding of commonsense (featural) knowledge

    A Feature Type Classification for Therapeutic Purposes: A Preliminary Evaluation with Non-Expert Speakers

    Get PDF
    We propose a feature type classification thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classification. Nevertheless, our proposal has both a practical and a theoretical out-come, and its applications range from com-putational linguistics to psycholinguistics. An evaluation through inter-coder agree-ment has been performed to highlight the strength of our proposal and to conceive some improvements for the future

    Encoding Commonsense Lexical Knowledge into WordNet

    Get PDF
    In this paper, we propose an extension of the WordNet conceptual model, with the final purpose of encoding the common sense lexical knowledge associated to words used in everyday life. The extended model has been defined starting from the short descriptions generated by naïve speakers in relation to tar-get concepts (i.e. feature norms). Even if this proposal has been developed primarily for therapeutic purposes, it can be seen as a generalization of the original WordNet model that takes into account a much wider and systematic set of semantic relations. The extended model is also an enhancement of the psycholinguistic vocation of the WordNet model. A featural representation of concepts is nowadays assumed by most models of the human semantic memory. For testing our proposal, we conducted a fea-ture elicitation experiment and collected de-scriptions of 50 concepts from 60 participants. Problematic issues related to the encoding of this information into WordNet are discussed and preliminary results are presented

    Evaluating cross-language annotation transfer in the MultiSemCor corpus

    Full text link
    In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.

    English/Veneto Resource Poor Machine Translation with STILVEN

    Get PDF
    The paper reports ongoing work for the implementation of a system for automatic translation from English-to-Veneto and viceversa. The system does not have parallel texts to work on because of the almost inexistence of such manual translations. The project is called STILVEN and is financed by the Regional Authorities of Veneto Region in Italy. After the first year of activities, we managed to produce a prototype which handles Venetian questions that have a structure very close to English. We will present problems related to Veneto, basic ideas, their implementatiion and results obtained

    VenPro: A Morphological Analyzer for Venetan

    Get PDF
    This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesize Venetan words. This task was challenging for several reasons, which are common to a number of lesser-used languages: although Venetan is widely used as an oral language in everyday life, its written usage is very limited; efforts for defining a standard orthography and grammar are very recent and not well established; despite recent attempts to propose a unified orthography, no Venetan standard is widely used. Besides, there are different geographical varieties and it is strongly influenced by Italian

    Recovering from Failure with the GraFo Left Corner Parser

    No full text
    GraFo is a left corner parser for Italian, based on explicit rules manually coded in a unification formalism. As the linguistic coverage of GraFo is still quite limited, the parser produces complete parse trees for a small percentage of sentences. This paper presents a number of strategies to recover from GraFo parsing failures. The various techniques have been evaluated on the data provided by the EVALITA 2007 evaluation campaign

    Extending WordNet with Syntagmatic Information

    No full text
    In this paper we present a proposal to extend WordNet-like lexical databases by adding information about the co-occurrence of word meanings in texts. More specifically we propose to add phrasets, i.e. sets of free combinations of words which are recurrently used to express a concept (let's call them Recurrent Free Phrases). Phrasets are a useful source of information for different NLP tasks, and particularly in a multilingual environment to manage lexical gaps. At least a part of recurrent free phrases can also be represented through a new set of syntagmantic (lexical and semantic) WordNet relations
    corecore